{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e44c15a8",
   "metadata": {},
   "source": [
    "In this notebook, we will implement forward and backward propogation functions for a multi layered neural from scratch in Pytorch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a391aab5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "549572b7",
   "metadata": {},
   "source": [
    "### Forward propagation\n",
    "Forward Propagation, for arbitrary layer $l \\in \\left\\lbrace 0, L \\right\\rbrace$: \n",
    "$$\\vec{z}^{\\left(l\\right)} = W^{\\left(l\\right)}  \\vec{a}^{\\left(l-1\\right)} + \\vec{b}^{\\left(l\\right)}$$ \n",
    "$$\\vec{a}^{\\left(l\\right)} = \\sigma\\left( \\vec{z}^{\\left(l\\right)} \\right)$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7e99803e",
   "metadata": {},
   "outputs": [],
   "source": [
    "def Z(x, W_l, b_l):\n",
    "    \"\"\"\n",
    "    Args\n",
    "        x: 1-d vector. Activation of layer l-1 \n",
    "        Wl: Weight matrix of layer l\n",
    "        bl: Bias of layer l\n",
    "    \"\"\"\n",
    "    return torch.matmul(W_l, x) + b_l\n",
    "\n",
    "def A(z_l):\n",
    "    \"\"\"\n",
    "    Sigmoid activation function (non-linear layer)\n",
    "    \"\"\"\n",
    "    return torch.sigmoid(z_l)\n",
    "\n",
    "def forward(x, W, b):\n",
    "    \"\"\"\n",
    "    In the forward pass, we loop over every single layer, and perform forward propagation as\n",
    "    defined by the equation above\n",
    "    Args\n",
    "        x: 1-d input vector. Represents a single training data instance\n",
    "        W: List of weight matrices. From 0 to L\n",
    "        b: List of bias vectors. From 0 to L\n",
    "    \"\"\"\n",
    "    L = len(W) - 1\n",
    "    a_l = x\n",
    "    for l in range(0, L + 1):\n",
    "        z_l = Z(a_l, W[l], b[l])\n",
    "        a_l = A(z_l)\n",
    "    return a_l"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee675295",
   "metadata": {},
   "source": [
    "### Loss\n",
    "Here we are working with a single training data instance, $x_{i}$ whose GT output is $\\bar{y}_{i}$. \n",
    "\n",
    "$\\mathbb{L} = \\frac{1}{2} \\left( a^{ \\left( L \\right) } - \\bar{y}_{i} \\right)^{2}$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "25ca01be",
   "metadata": {},
   "outputs": [],
   "source": [
    "def mse_loss(a_L, y):\n",
    "    \"\"\"\n",
    "    Args\n",
    "        a_L: Activation of the last layer\n",
    "        y: Ground Truth\n",
    "    \"\"\"\n",
    "    return 1./ 2 * torch.pow((a_L - y), 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f34d9648",
   "metadata": {},
   "source": [
    "### Backpropagation\n",
    "\n",
    "Backpropagation, for last layer $L$ \n",
    "\n",
    "$$\\vec{\\delta}^{\\left( L \\right)} = \\left( \\vec{a}^{ \\left( L \\right) }  - \\bar{y} \\right) \\circ \\vec{a}^{\\left( L \\right)} \\circ \\left( \\vec{1} - \\vec{a}^{\\left( L \\right)}  \\right)$$\n",
    "\n",
    "$$\\nabla_{ W^{ \\left( L \\right) } } \\mathbb{L} = \\vec{ \\delta }^{ \\left( L \\right) } \\left( \\vec{ a }^{ \\left( L - 1 \\right) } \\right)^{T}$$\n",
    "\n",
    "$$\\nabla_{ b^{ \\left( L \\right) } } \\mathbb{L} = \\vec{ \\delta }^{ \\left( L \\right) }$$\n",
    "\n",
    "\n",
    "Backpropagation, for arbitrary layer $l \\in \\left\\lbrace 0, L-1 \\right\\rbrace$: \n",
    "\n",
    "$$\\vec{\\delta}^{ \\left( l \\right) } = \\left(\\left(  W^{ \\left( l+1 \\right) } \\right)^{T}  \\vec{ \\delta }^{ \\left( l+1 \\right) }\\right) \\circ \\vec{a}^{ \\left( l \\right) } \\circ \\left( \\vec{1} -  \\vec{a}^{ \\left( l \\right) } \\right)$$\n",
    "\n",
    "$$\\nabla_{ W^{ \\left( l \\right) } } \\mathbb{L} = \\vec{ \\delta }^{ \\left( l \\right) } \\left( \\vec{ a }^{ \\left( l - 1 \\right) } \\right)^{T}$$\n",
    "\n",
    "$$\\nabla_{ b^{ \\left( l \\right) } } \\mathbb{L} = \\vec{ \\delta }^{ \\left( l \\right) }$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "322bab2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def forward_backward(x, y, W, b):\n",
    "    L = len(W) - 1\n",
    "    a = []\n",
    "    # We are caching the output of the forward propagation of the intermediate layers\n",
    "    # to help with the calcuation of the gradients during backward propagation.\n",
    "    for l in range(0, L+1):\n",
    "        a_prev = x if l == 0 else a[l-1]\n",
    "        z_l = Z(a_prev, W[l], b[l])\n",
    "        a_l = A(z_l)\n",
    "        a.append(a_l)\n",
    "\n",
    "    print('Final activation', a[L])\n",
    "    loss = mse_loss(a[L], y)\n",
    "    print('Loss', loss)\n",
    "    \n",
    "    deltas = [None for _ in range(L + 1)]\n",
    "    W_grads = [None for _ in range(L + 1)]\n",
    "    b_grads = [None for _ in range(L + 1)]\n",
    "    \n",
    "    # Compute for the last layer\n",
    "    a_L = a[L]\n",
    "    deltas[L] = (a_L - y) * a_L * (1 - a_L)\n",
    "    W_grads[L] = torch.matmul(deltas[L], a[L - 1].T)\n",
    "    b_grads[L] = deltas[L]\n",
    "\n",
    "    for l in range(L-1, -1, -1):\n",
    "        a_l = a[l]\n",
    "        deltas[l] =  torch.matmul(W[l + 1].T, deltas[l + 1]) * a_l * (1 - a_l)\n",
    "        W_grads[l] = torch.matmul(deltas[l], a[l - 1].T)\n",
    "        b_grads[l] = deltas[l]\n",
    "        \n",
    "    for l in range(0, L + 1):\n",
    "        print('Layer: {}, Shapes - W: {}, W_grad: {}, b: {}, b_grad: {}, delta: {}'.format(\n",
    "                l, list(W[l].shape), list(W_grads[l].shape), \n",
    "                list(b[l].shape), list(b_grads[l].shape), list(deltas[l].shape)\n",
    "        ))\n",
    "    return loss, W_grads, b_grads"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4b4c4899",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = torch.tensor([1, 2, 3, 4, 5, 6, 7], dtype=torch.float32).unsqueeze(dim=1)\n",
    "y = torch.tensor(7.9, dtype=torch.float32)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e92b3184",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Final activation tensor([[0.0739]])\n",
      "Loss tensor([[0.1933]])\n",
      "Layer: 0, Shapes - W: [5, 7], W_grad: [5, 1], b: [5, 1], b_grad: [5, 1], delta: [5, 1]\n",
      "Layer: 1, Shapes - W: [3, 5], W_grad: [3, 5], b: [3, 1], b_grad: [3, 1], delta: [3, 1]\n",
      "Layer: 2, Shapes - W: [1, 3], W_grad: [1, 3], b: [1, 1], b_grad: [1, 1], delta: [1, 1]\n"
     ]
    }
   ],
   "source": [
    "x = torch.randn(7, 1)\n",
    "y = torch.randn(1, 1)\n",
    "w0 = torch.randn(5,7)\n",
    "b0 = torch.randn(5,1)\n",
    "w1 = torch.randn(3,5)\n",
    "b1 = torch.randn(3,1)\n",
    "w2 = torch.randn(1,3)\n",
    "b2 = torch.randn(1,1)\n",
    "\n",
    "loss, W_grads, b_grads = forward_backward(x, y, [w0, w1, w2], [b0, b1, b2])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
